Overview

Dataset statistics

Number of variables13
Number of observations368757
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory34.1 MiB
Average record size in memory97.0 B

Variable types

Text5
Numeric6
Categorical1
Boolean1

Alerts

target is highly overall correlated with target_logHigh correlation
target_log is highly overall correlated with targetHigh correlation
Pool is highly imbalanced (50.5%)Imbalance
sqft is highly skewed (γ1 = 116.8531877)Skewed
target is highly skewed (γ1 = 25.15742786)Skewed
dist_sch_min is highly skewed (γ1 = 197.6766553)Skewed
sqft has 10848 (2.9%) zerosZeros

Reproduction

Analysis started2024-06-02 06:23:39.670241
Analysis finished2024-06-02 06:31:08.786484
Duration7 minutes and 29.12 seconds
Software versionydata-profiling v4.8.3
Download configurationconfig.json

Variables

status
Text

Distinct94
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.8 MiB
2024-06-02T06:31:09.050838image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length31
Median length7
Mean length7.3296995
Min length1

Characters and Unicode

Total characters2702878
Distinct characters31
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique16 ?
Unique (%)< 0.1%

Sample

1st rowactive
2nd rowforsale
3rd rowforsale
4th rowforsale
5th rowforsale
ValueCountFrequency (%)
forsale 197077
53.4%
active 103224
28.0%
undefined 38535
 
10.4%
foreclosure 5991
 
1.6%
newconstruction 5357
 
1.5%
pending 4742
 
1.3%
pre-foreclosure 2000
 
0.5%
undercontractshow 1933
 
0.5%
p 1484
 
0.4%
auction 1292
 
0.4%
Other values (84) 7122
 
1.9%
2024-06-02T06:31:09.848289image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 419633
15.5%
a 309463
11.4%
f 246010
9.1%
o 238693
8.8%
r 233513
8.6%
s 215199
8.0%
l 206796
7.7%
i 159367
 
5.9%
c 138266
 
5.1%
t 128732
 
4.8%
Other values (21) 407206
15.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 2702878
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 419633
15.5%
a 309463
11.4%
f 246010
9.1%
o 238693
8.8%
r 233513
8.6%
s 215199
8.0%
l 206796
7.7%
i 159367
 
5.9%
c 138266
 
5.1%
t 128732
 
4.8%
Other values (21) 407206
15.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 2702878
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 419633
15.5%
a 309463
11.4%
f 246010
9.1%
o 238693
8.8%
r 233513
8.6%
s 215199
8.0%
l 206796
7.7%
i 159367
 
5.9%
c 138266
 
5.1%
t 128732
 
4.8%
Other values (21) 407206
15.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 2702878
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 419633
15.5%
a 309463
11.4%
f 246010
9.1%
o 238693
8.8%
r 233513
8.6%
s 215199
8.0%
l 206796
7.7%
i 159367
 
5.9%
c 138266
 
5.1%
t 128732
 
4.8%
Other values (21) 407206
15.1%
Distinct164
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.8 MiB
2024-06-02T06:31:10.624417image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length46
Median length13
Mean length10.419843
Min length3

Characters and Unicode

Total characters3842390
Distinct characters42
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique28 ?
Unique (%)< 0.1%

Sample

1st rowsingle family
2nd rowsingle family
3rd rowsingle family
4th rowsingle family
5th rowlot/land
ValueCountFrequency (%)
family 195849
33.8%
single 186831
32.3%
condo 41920
 
7.2%
unknown 33517
 
5.8%
lot/land 19560
 
3.4%
townhouse 18077
 
3.1%
multi 12103
 
2.1%
land 9941
 
1.7%
condo/townhome 8253
 
1.4%
traditional 6045
 
1.0%
Other values (193) 46918
 
8.1%
2024-06-02T06:31:11.759098image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
l 460715
12.0%
i 423188
11.0%
n 413556
10.8%
a 259300
 
6.7%
e 246293
 
6.4%
o 246094
 
6.4%
m 230243
 
6.0%
s 218275
 
5.7%
210257
 
5.5%
y 202393
 
5.3%
Other values (32) 932076
24.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 3842390
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
l 460715
12.0%
i 423188
11.0%
n 413556
10.8%
a 259300
 
6.7%
e 246293
 
6.4%
o 246094
 
6.4%
m 230243
 
6.0%
s 218275
 
5.7%
210257
 
5.5%
y 202393
 
5.3%
Other values (32) 932076
24.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 3842390
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
l 460715
12.0%
i 423188
11.0%
n 413556
10.8%
a 259300
 
6.7%
e 246293
 
6.4%
o 246094
 
6.4%
m 230243
 
6.0%
s 218275
 
5.7%
210257
 
5.5%
y 202393
 
5.3%
Other values (32) 932076
24.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 3842390
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
l 460715
12.0%
i 423188
11.0%
n 413556
10.8%
a 259300
 
6.7%
e 246293
 
6.4%
o 246094
 
6.4%
m 230243
 
6.0%
s 218275
 
5.7%
210257
 
5.5%
y 202393
 
5.3%
Other values (32) 932076
24.3%

baths
Text

Distinct114
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.8 MiB
2024-06-02T06:31:12.265117image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length7
Median length1
Mean length3.192197
Min length1

Characters and Unicode

Total characters1177145
Distinct characters18
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique18 ?
Unique (%)< 0.1%

Sample

1st row3.5
2nd row3
3rd row2
4th row8
5th rowunknown
ValueCountFrequency (%)
unknown 103909
28.2%
2 83965
22.8%
3 53303
14.5%
4 21143
 
5.7%
2.0 16195
 
4.4%
2.5 12599
 
3.4%
3.0 10631
 
2.9%
1 10406
 
2.8%
5 7624
 
2.1%
1.0 5759
 
1.6%
Other values (92) 43223
11.7%
2024-06-02T06:31:13.186840image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
n 311727
26.5%
2 120771
 
10.3%
u 103909
 
8.8%
k 103909
 
8.8%
o 103909
 
8.8%
w 103909
 
8.8%
0 72533
 
6.2%
3 71784
 
6.1%
. 62807
 
5.3%
5 41849
 
3.6%
Other values (8) 80038
 
6.8%

Most occurring categories

ValueCountFrequency (%)
(unknown) 1177145
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
n 311727
26.5%
2 120771
 
10.3%
u 103909
 
8.8%
k 103909
 
8.8%
o 103909
 
8.8%
w 103909
 
8.8%
0 72533
 
6.2%
3 71784
 
6.1%
. 62807
 
5.3%
5 41849
 
3.6%
Other values (8) 80038
 
6.8%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 1177145
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
n 311727
26.5%
2 120771
 
10.3%
u 103909
 
8.8%
k 103909
 
8.8%
o 103909
 
8.8%
w 103909
 
8.8%
0 72533
 
6.2%
3 71784
 
6.1%
. 62807
 
5.3%
5 41849
 
3.6%
Other values (8) 80038
 
6.8%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 1177145
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
n 311727
26.5%
2 120771
 
10.3%
u 103909
 
8.8%
k 103909
 
8.8%
o 103909
 
8.8%
w 103909
 
8.8%
0 72533
 
6.2%
3 71784
 
6.1%
. 62807
 
5.3%
5 41849
 
3.6%
Other values (8) 80038
 
6.8%

city
Text

Distinct1864
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size2.8 MiB
2024-06-02T06:31:13.662676image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length38
Median length29
Mean length9.0020474
Min length1

Characters and Unicode

Total characters3319568
Distinct characters60
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique369 ?
Unique (%)0.1%

Sample

1st rowSouthern Pines
2nd rowSpokane Valley
3rd rowLos Angeles
4th rowDallas
5th rowPalm Bay
ValueCountFrequency (%)
houston 23907
 
4.8%
miami 20472
 
4.1%
san 18986
 
3.8%
antonio 15229
 
3.1%
fort 11307
 
2.3%
jacksonville 10092
 
2.0%
charlotte 9415
 
1.9%
beach 8673
 
1.8%
dallas 8517
 
1.7%
brooklyn 7150
 
1.4%
Other values (1674) 359772
72.9%
2024-06-02T06:31:14.442388image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 346516
 
10.4%
o 285836
 
8.6%
n 256457
 
7.7%
e 251948
 
7.6%
l 223332
 
6.7%
i 217604
 
6.6%
t 198501
 
6.0%
s 159868
 
4.8%
r 157651
 
4.7%
124803
 
3.8%
Other values (50) 1097052
33.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 3319568
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
a 346516
 
10.4%
o 285836
 
8.6%
n 256457
 
7.7%
e 251948
 
7.6%
l 223332
 
6.7%
i 217604
 
6.6%
t 198501
 
6.0%
s 159868
 
4.8%
r 157651
 
4.7%
124803
 
3.8%
Other values (50) 1097052
33.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 3319568
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
a 346516
 
10.4%
o 285836
 
8.6%
n 256457
 
7.7%
e 251948
 
7.6%
l 223332
 
6.7%
i 217604
 
6.6%
t 198501
 
6.0%
s 159868
 
4.8%
r 157651
 
4.7%
124803
 
3.8%
Other values (50) 1097052
33.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 3319568
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
a 346516
 
10.4%
o 285836
 
8.6%
n 256457
 
7.7%
e 251948
 
7.6%
l 223332
 
6.7%
i 217604
 
6.6%
t 198501
 
6.0%
s 159868
 
4.8%
r 157651
 
4.7%
124803
 
3.8%
Other values (50) 1097052
33.0%

sqft
Real number (ℝ)

SKEWED  ZEROS 

Distinct9877
Distinct (%)2.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2583.1376
Minimum-1
Maximum5728968
Zeros10848
Zeros (%)2.9%
Negative39006
Negative (%)10.6%
Memory size2.8 MiB
2024-06-02T06:31:14.776676image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum-1
5-th percentile-1
Q11049
median1665
Q32471
95-th percentile4546
Maximum5728968
Range5728969
Interquartile range (IQR)1422

Descriptive statistics

Standard deviation22781.321
Coefficient of variation (CV)8.8192439
Kurtosis21345.53
Mean2583.1376
Median Absolute Deviation (MAD)693
Skewness116.85319
Sum9.5255008 × 108
Variance5.1898858 × 108
MonotonicityNot monotonic
2024-06-02T06:31:15.078941image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-1 39006
 
10.6%
0 10848
 
2.9%
1200 1403
 
0.4%
1000 1009
 
0.3%
1500 988
 
0.3%
1800 967
 
0.3%
1100 923
 
0.3%
1400 889
 
0.2%
2000 857
 
0.2%
1600 822
 
0.2%
Other values (9867) 311045
84.3%
ValueCountFrequency (%)
-1 39006
10.6%
0 10848
 
2.9%
1 76
 
< 0.1%
2 6
 
< 0.1%
3 2
 
< 0.1%
4 1
 
< 0.1%
5 2
 
< 0.1%
6 1
 
< 0.1%
10 2
 
< 0.1%
11 1
 
< 0.1%
ValueCountFrequency (%)
5728968 1
< 0.1%
4356000 2
< 0.1%
2807917 2
< 0.1%
2613600 1
< 0.1%
2585006 2
< 0.1%
1916640 1
< 0.1%
1761113 1
< 0.1%
1611720 1
< 0.1%
1598652 1
< 0.1%
1524600 1
< 0.1%

zipcode
Real number (ℝ)

Distinct4258
Distinct (%)1.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean51492.224
Minimum1103
Maximum331446
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.8 MiB
2024-06-02T06:31:15.384994image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum1103
5-th percentile11234
Q132833
median37205
Q377382
95-th percentile95403
Maximum331446
Range330343
Interquartile range (IQR)44549

Descriptive statistics

Standard deviation26863.544
Coefficient of variation (CV)0.52170099
Kurtosis-1.30296
Mean51492.224
Median Absolute Deviation (MAD)17195
Skewness0.29588501
Sum1.8988118 × 1010
Variance7.2165002 × 108
MonotonicityNot monotonic
2024-06-02T06:31:15.689442image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
32137 2097
 
0.6%
33131 1542
 
0.4%
34747 1471
 
0.4%
78245 1360
 
0.4%
33137 1298
 
0.4%
33132 1294
 
0.4%
78253 1253
 
0.3%
34759 1241
 
0.3%
78254 1212
 
0.3%
33130 1155
 
0.3%
Other values (4248) 354834
96.2%
ValueCountFrequency (%)
1103 1
 
< 0.1%
1104 10
< 0.1%
1105 10
< 0.1%
1106 1
 
< 0.1%
1107 2
 
< 0.1%
1108 15
< 0.1%
1109 21
< 0.1%
1118 8
 
< 0.1%
1119 7
 
< 0.1%
1128 5
 
< 0.1%
ValueCountFrequency (%)
331446 1
 
< 0.1%
123456 1
 
< 0.1%
112229 1
 
< 0.1%
99338 103
< 0.1%
99337 146
< 0.1%
99336 126
< 0.1%
99224 122
< 0.1%
99223 93
< 0.1%
99218 33
 
< 0.1%
99217 82
< 0.1%

state
Categorical

Distinct38
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.8 MiB
FL
113015 
TX
81529 
NY
24002 
CA
23094 
NC
21388 
Other values (33)
105729 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters737514
Distinct characters26
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)< 0.1%

Sample

1st rowNC
2nd rowWA
3rd rowCA
4th rowTX
5th rowFL

Common Values

ValueCountFrequency (%)
FL 113015
30.6%
TX 81529
22.1%
NY 24002
 
6.5%
CA 23094
 
6.3%
NC 21388
 
5.8%
TN 17673
 
4.8%
WA 13563
 
3.7%
OH 12282
 
3.3%
IL 8799
 
2.4%
NV 8335
 
2.3%
Other values (28) 45077
 
12.2%

Length

2024-06-02T06:31:15.985690image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
fl 113016
30.6%
tx 81529
22.1%
ny 24002
 
6.5%
ca 23094
 
6.3%
nc 21388
 
5.8%
tn 17673
 
4.8%
wa 13563
 
3.7%
oh 12282
 
3.3%
il 8799
 
2.4%
nv 8335
 
2.3%
Other values (27) 45076
 
12.2%

Most occurring characters

ValueCountFrequency (%)
L 121815
16.5%
F 113016
15.3%
T 101348
13.7%
X 81529
11.1%
N 75047
10.2%
C 55340
7.5%
A 54449
7.4%
Y 24089
 
3.3%
O 22186
 
3.0%
I 17807
 
2.4%
Other values (16) 70888
9.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 737514
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
L 121815
16.5%
F 113016
15.3%
T 101348
13.7%
X 81529
11.1%
N 75047
10.2%
C 55340
7.5%
A 54449
7.4%
Y 24089
 
3.3%
O 22186
 
3.0%
I 17807
 
2.4%
Other values (16) 70888
9.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 737514
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
L 121815
16.5%
F 113016
15.3%
T 101348
13.7%
X 81529
11.1%
N 75047
10.2%
C 55340
7.5%
A 54449
7.4%
Y 24089
 
3.3%
O 22186
 
3.0%
I 17807
 
2.4%
Other values (16) 70888
9.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 737514
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
L 121815
16.5%
F 113016
15.3%
T 101348
13.7%
X 81529
11.1%
N 75047
10.2%
C 55340
7.5%
A 54449
7.4%
Y 24089
 
3.3%
O 22186
 
3.0%
I 17807
 
2.4%
Other values (16) 70888
9.6%

target
Real number (ℝ)

HIGH CORRELATION  SKEWED 

Distinct34219
Distinct (%)9.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean649131.57
Minimum1
Maximum1.95 × 108
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.8 MiB
2024-06-02T06:31:16.254940image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile39000
Q1189310
median324750
Q3587500
95-th percentile1975000
Maximum1.95 × 108
Range1.95 × 108
Interquartile range (IQR)398190

Descriptive statistics

Standard deviation1848210.2
Coefficient of variation (CV)2.8472044
Kurtosis1346.3293
Mean649131.57
Median Absolute Deviation (MAD)169750
Skewness25.157428
Sum2.3937181 × 1011
Variance3.4158811 × 1012
MonotonicityNot monotonic
2024-06-02T06:31:16.567023image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
225000 1778
 
0.5%
350000 1632
 
0.4%
275000 1630
 
0.4%
250000 1606
 
0.4%
325000 1546
 
0.4%
399000 1528
 
0.4%
299900 1522
 
0.4%
249900 1480
 
0.4%
299000 1439
 
0.4%
450000 1428
 
0.4%
Other values (34209) 353168
95.8%
ValueCountFrequency (%)
1 15
< 0.1%
3 2
 
< 0.1%
8 1
 
< 0.1%
20 1
 
< 0.1%
25 1
 
< 0.1%
29 1
 
< 0.1%
30 1
 
< 0.1%
250 1
 
< 0.1%
393 1
 
< 0.1%
400 1
 
< 0.1%
ValueCountFrequency (%)
195000000 1
< 0.1%
165000000 2
< 0.1%
150000000 1
< 0.1%
129000000 1
< 0.1%
115000000 2
< 0.1%
110000000 2
< 0.1%
98000000 1
< 0.1%
88000000 1
< 0.1%
87000000 1
< 0.1%
85000000 1
< 0.1%

Pool
Boolean

IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size360.2 KiB
False
328840 
True
39917 
ValueCountFrequency (%)
False 328840
89.2%
True 39917
 
10.8%
2024-06-02T06:31:16.903183image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Distinct221
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size2.8 MiB
2024-06-02T06:31:17.351061image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters1475028
Distinct characters14
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique14 ?
Unique (%)< 0.1%

Sample

1st row2019
2nd row2019
3rd row1961
4th row2006
5th rowNone
ValueCountFrequency (%)
none 59798
 
16.2%
2019 30904
 
8.4%
2006 7925
 
2.1%
2005 7428
 
2.0%
2007 7065
 
1.9%
2018 6731
 
1.8%
2004 5461
 
1.5%
2017 5119
 
1.4%
2016 5064
 
1.4%
2008 4953
 
1.3%
Other values (211) 228309
61.9%
2024-06-02T06:31:18.142387image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 283437
19.2%
9 270242
18.3%
0 214884
14.6%
2 157027
10.6%
8 64099
 
4.3%
5 62249
 
4.2%
N 59798
 
4.1%
o 59798
 
4.1%
n 59798
 
4.1%
e 59798
 
4.1%
Other values (4) 183898
12.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 1475028
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
1 283437
19.2%
9 270242
18.3%
0 214884
14.6%
2 157027
10.6%
8 64099
 
4.3%
5 62249
 
4.2%
N 59798
 
4.1%
o 59798
 
4.1%
n 59798
 
4.1%
e 59798
 
4.1%
Other values (4) 183898
12.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 1475028
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
1 283437
19.2%
9 270242
18.3%
0 214884
14.6%
2 157027
10.6%
8 64099
 
4.3%
5 62249
 
4.2%
N 59798
 
4.1%
o 59798
 
4.1%
n 59798
 
4.1%
e 59798
 
4.1%
Other values (4) 183898
12.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 1475028
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
1 283437
19.2%
9 270242
18.3%
0 214884
14.6%
2 157027
10.6%
8 64099
 
4.3%
5 62249
 
4.2%
N 59798
 
4.1%
o 59798
 
4.1%
n 59798
 
4.1%
e 59798
 
4.1%
Other values (4) 183898
12.5%

r_sch_mean
Real number (ℝ)

Distinct79
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.8407444
Minimum-1
Maximum9
Zeros0
Zeros (%)0.0%
Negative5461
Negative (%)1.5%
Memory size2.8 MiB
2024-06-02T06:31:18.470505image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum-1
5-th percentile2
Q13.3
median4.8
Q36.3
95-th percentile8
Maximum9
Range10
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.9693033
Coefficient of variation (CV)0.40681827
Kurtosis-0.060039204
Mean4.8407444
Median Absolute Deviation (MAD)1.5
Skewness-0.11084441
Sum1785058.4
Variance3.8781554
MonotonicityNot monotonic
2024-06-02T06:31:18.757921image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
5 23487
 
6.4%
4 21855
 
5.9%
6 20206
 
5.5%
3 19323
 
5.2%
3.3 14948
 
4.1%
6.3 14861
 
4.0%
4.7 14754
 
4.0%
3.7 14133
 
3.8%
7 13914
 
3.8%
5.7 13723
 
3.7%
Other values (69) 197553
53.6%
ValueCountFrequency (%)
-1 5461
1.5%
1 2232
0.6%
1.2 18
 
< 0.1%
1.3 1067
 
0.3%
1.4 12
 
< 0.1%
1.5 1948
 
0.5%
1.6 72
 
< 0.1%
1.7 2565
0.7%
1.8 281
 
0.1%
1.9 24
 
< 0.1%
ValueCountFrequency (%)
9 7103
1.9%
8.8 242
 
0.1%
8.7 2335
 
0.6%
8.6 169
 
< 0.1%
8.5 2993
 
0.8%
8.4 276
 
0.1%
8.3 2809
 
0.8%
8.2 1404
 
0.4%
8 11435
3.1%
7.9 1
 
< 0.1%

dist_sch_min
Real number (ℝ)

SKEWED 

Distinct1539
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.1919819
Minimum-1
Maximum1590.38
Zeros761
Zeros (%)0.2%
Negative4256
Negative (%)1.2%
Memory size2.8 MiB
2024-06-02T06:31:19.050930image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum-1
5-th percentile0.1
Q10.34
median0.67
Q31.3
95-th percentile3.62
Maximum1590.38
Range1591.38
Interquartile range (IQR)0.96

Descriptive statistics

Standard deviation5.4420641
Coefficient of variation (CV)4.5655594
Kurtosis49970.675
Mean1.1919819
Median Absolute Deviation (MAD)0.38
Skewness197.67666
Sum439551.67
Variance29.616062
MonotonicityNot monotonic
2024-06-02T06:31:19.345754image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.3 22015
 
6.0%
0.4 20522
 
5.6%
0.2 19862
 
5.4%
0.5 18396
 
5.0%
0.6 15751
 
4.3%
0.7 13490
 
3.7%
0.1 12336
 
3.3%
0.8 10824
 
2.9%
0.9 9415
 
2.6%
1.1 7369
 
2.0%
Other values (1529) 218777
59.3%
ValueCountFrequency (%)
-1 4256
1.2%
0 761
 
0.2%
0.01 1
 
< 0.1%
0.02 23
 
< 0.1%
0.03 119
 
< 0.1%
0.04 183
 
< 0.1%
0.05 342
 
0.1%
0.06 405
 
0.1%
0.07 501
 
0.1%
0.08 543
 
0.1%
ValueCountFrequency (%)
1590.38 1
< 0.1%
1590.36 1
< 0.1%
1187.14 1
< 0.1%
725.21 1
< 0.1%
725.2 1
< 0.1%
725.19 2
< 0.1%
725.17 1
< 0.1%
460.86 1
< 0.1%
312.4 1
< 0.1%
117.8 1
< 0.1%

target_log
Real number (ℝ)

HIGH CORRELATION 

Distinct34219
Distinct (%)9.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean12.653457
Minimum0
Maximum19.08851
Zeros15
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size2.8 MiB
2024-06-02T06:31:19.650048image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile10.571317
Q112.151141
median12.690811
Q313.283632
95-th percentile14.496079
Maximum19.08851
Range19.08851
Interquartile range (IQR)1.1324904

Descriptive statistics

Standard deviation1.1973912
Coefficient of variation (CV)0.094629569
Kurtosis3.6717131
Mean12.653457
Median Absolute Deviation (MAD)0.56324052
Skewness-0.6858503
Sum4666051
Variance1.4337457
MonotonicityNot monotonic
2024-06-02T06:31:19.980618image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
12.32385568 1778
 
0.5%
12.76568843 1632
 
0.4%
12.52452638 1630
 
0.4%
12.4292162 1606
 
0.4%
12.69158046 1546
 
0.4%
12.8967167 1528
 
0.4%
12.61120436 1522
 
0.4%
12.42881612 1480
 
0.4%
12.60819885 1439
 
0.4%
13.01700286 1428
 
0.4%
Other values (34209) 353168
95.8%
ValueCountFrequency (%)
0 15
< 0.1%
1.098612289 2
 
< 0.1%
2.079441542 1
 
< 0.1%
2.995732274 1
 
< 0.1%
3.218875825 1
 
< 0.1%
3.36729583 1
 
< 0.1%
3.401197382 1
 
< 0.1%
5.521460918 1
 
< 0.1%
5.973809612 1
 
< 0.1%
5.991464547 1
 
< 0.1%
ValueCountFrequency (%)
19.08851012 1
< 0.1%
18.92145603 2
< 0.1%
18.82614585 1
< 0.1%
18.67532296 1
< 0.1%
18.56044269 2
< 0.1%
18.51599092 2
< 0.1%
18.40047804 1
< 0.1%
18.29284737 1
< 0.1%
18.28141868 1
< 0.1%
18.25816181 1
< 0.1%

Interactions

2024-06-02T06:30:19.835074image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-06-02T06:24:20.728060image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-06-02T06:25:09.151081image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-06-02T06:28:03.332057image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-06-02T06:28:49.263978image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-06-02T06:29:35.490361image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-06-02T06:30:20.278947image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-06-02T06:24:21.167345image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-06-02T06:25:31.579381image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-06-02T06:28:03.772838image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-06-02T06:28:49.692674image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-06-02T06:29:35.939061image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-06-02T06:31:04.639311image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-06-02T06:25:07.861790image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-06-02T06:26:35.465688image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-06-02T06:28:47.946425image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-06-02T06:29:34.233308image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-06-02T06:30:18.509478image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-06-02T06:31:04.962124image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-06-02T06:25:08.163866image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-06-02T06:26:58.588918image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-06-02T06:28:48.280517image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-06-02T06:29:34.543938image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-06-02T06:30:18.827900image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-06-02T06:31:05.291914image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-06-02T06:25:08.429951image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-06-02T06:27:20.390708image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-06-02T06:28:48.602102image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-06-02T06:29:34.826524image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-06-02T06:30:19.158397image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-06-02T06:31:05.628529image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-06-02T06:25:08.722009image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-06-02T06:27:42.479005image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-06-02T06:28:48.935829image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-06-02T06:29:35.157719image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-06-02T06:30:19.476026image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Correlations

2024-06-02T06:31:20.238767image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Pooldist_sch_minr_sch_meansqftstatetargettarget_logzipcode
Pool1.0000.1010.1060.1540.1990.1660.166-0.006
dist_sch_min0.1011.0000.1280.0430.000-0.102-0.102-0.024
r_sch_mean0.1060.1281.0000.2210.2050.3050.3050.073
sqft0.1540.0430.2211.0000.0070.4990.4990.128
state0.1990.0000.2050.0071.000-0.070-0.0700.269
target0.166-0.1020.3050.499-0.0701.0001.0000.007
target_log0.166-0.1020.3050.499-0.0701.0001.0000.007
zipcode-0.006-0.0240.0730.1280.2690.0070.0071.000

Missing values

2024-06-02T06:31:06.145020image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
A simple visualization of nullity by column.
2024-06-02T06:31:07.061969image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

statuspropertyTypebathscitysqftzipcodestatetargetPoolYear builtr_sch_meandist_sch_mintarget_log
0activesingle family3.5Southern Pines290028387NC418000False20195.22.7012.943237
1forsalesingle family3Spokane Valley194799216WA310000False20194.01.0112.644328
2forsalesingle family2Los Angeles300090049CA2895000True19616.71.1914.878496
3forsalesingle family8Dallas645775205TX2395000False20069.00.1014.688894
4forsalelot/landunknownPalm Bay-132908FL5000FalseNone4.73.038.517193
5forsaletownhouseunknownPhiladelphia89719145PA209000False1920-1.0-1.0012.250090
6activefloridaunknownPoinciana150734759FL181500False20062.30.8012.109011
7activeunknownunknownMemphis-138115TN68000False19762.70.4011.127263
8activesingle family2Mason City358850401IA244900False19703.85.6012.408605
9undefinedsingle family3Houston193077080TX311995False20193.00.6012.650742
statuspropertyTypebathscitysqftzipcodestatetargetPoolYear builtr_sch_meandist_sch_mintarget_log
368747forsalesingle family3Houston179277080TX280000False19702.70.1912.542545
368748undefinedsingle family2.0Orlando182932805FL171306False19622.31.1012.051207
368749activesingle detachedunknownFort Worth189576110TX199900False19215.00.5012.205573
368750undefinedsingle family2Houston184177089TX252990False20196.00.3012.441105
368751forsalecondo3Washington141720001DC799000False20103.00.1013.591116
368752undefinedsingle family6.0Miami401733180FL1249000True19905.01.1014.037854
368753forsalecondo3Chicago200060657IL674999False19244.30.4013.422466
368754forsalesingle family3Jamaica115211434NY528000False19504.50.4813.176852
368755undefinedunknownunknownHouston-177028TX34500FalseNone-1.00.5010.448715
368756undefinedsingle family2.0San Antonio146278218TX204900False20194.00.3012.230277